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Abstract. Using elementary results from Fourier analysis, we provide 
an alternate proof of a necessary and sufficient condition for the sum of 
M independent continuous random variables modulo 1 to converge to the 
uniform distribution in L ([0, 1]), and discuss generalizations to discrete 
random variables. A consequence is that if X\, . . . , Xm are independent 
continuous random variables with densities /i, . . . , /m, for any base B as 
M — > oo for many choices of the densities the distribution of the digits of 
X\ ■ ■ ■ Xm converges to Benford's law base B. The rate of convergence 
can be quantified in terms of the Fourier coefficients of the densities, and 
provides an explanation for the prevalence of Benford behavior in many 
diverse systems. To highlight the difference in behavior between iden- 
tically and non-identically distributed random variables, we construct a 
sequence of densities {fi} with the following properties: (1) for each i, 
if every X^ is independently chosen with density fi then the sum con- 
verges to the uniform distribution; (2) if the X^s are independent but 
non-identical, with Xk having distribution then the sum does not 
converge to the uniform distribution. 



1. Introduction 

We investigate necessary and sufficient conditions for the distribution of 
a sum of random variables modulo 1 to converge to the uniform distribu- 
tion. This topic has been fruitfully studied by many previous researchers. 
Our purpose here is to provide an elementary proof of prior results, and 
explicitly connect this problem to related problems in the Benford's Law 
literature concerning the distribution of the leading digits of products of 
random variables. As this question has motivated much of the research on 
this topic, we briefly describe that problem and its history, and then state 
our results. 

For any base B we may uniquely write a positive x G K. as x = Mb{x) -B k , 
where k G Z and Mb{x) (called the mantissa) is in [1,B). A sequence of 
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positive numbers {«„} is said to be Benford base B (or to satisfy Benford's 
Law base B) if the probability of observing the base-S mantissa of a n of at 
most s is \og B s. More precisely, 

#{n < N : 1 < M B (a n ) < s} 
hm = \og B s. (1.1) 

N^co iV 

Benford behavior for continuous systems is defined analogously. Thus base 
10 the probability of observing a first digit of j is log 10 (j + 1) — log 10 (j), 
implying that about 30% of the time the first digit is a 1. 

Benford's Law was first observed by Newcomb in the 1880s, who noticed 
that pages of numbers starting with 1 in logarithm tables were significantly 
more worn than those starting with 9. In 1938 Benford [Ben] observed the 
same digit bias in 20 different lists with over 20,000 numbers in all. See [Hilt 
IRai] for a description and history. Many diverse systems have been shown 
to satisfy Benford's law, ranging from recurrence relations jBrDu] to n\ and 
(0 < k < n) [Dia] to iterates of power, exponential and rational maps 
[BBH, IHi2j to values of //-functions near the critical line and characteristic 
polynomials of random matrix ensembles [KoMij to iterates of the 3x + 1 
Map |KoMit ILSj to differences of order statistics |MN] . There are numerous 
applications of Benford's Law. It is observed in natural systems ranging 
from hydrology data |NM] to stock prices [Ley], and is used in computer 
science in analyzing round-off errors (see page 255 of [Knuj and |BHj ). in 
determining the optimal way to store numbers^ [HaJ, and in accounting to 
detect tax fraud |Nigl[ |Nig2| . See |Huj for a detailed bibliography of the 



field. 

In this paper we consider the distribution of digits of products of indepen- 
dent random variables, X\ ■ ■ -Xm, and the related questions about proba- 
bility densities of random variables modulo 1. Many authors [Sal ISTl \AS\ 
\Adh\ IHal ITuj have observed that the product (and more generally, any nice 
arithmetic operation) of two random variables is often closer to satisfying 
Benford's law than the input random variables; further, that as the num- 
ber of terms increases, the resulting expression seems to approach Benford's 
Law. 

Many of the previous works are concerned with determining exact formu- 
las for the distribution of X\ • • • X M ; however, to understand the distribution 
of the digits all we need is to understand log B \Xi ■ ■ ■ Xm\ mod 1. This leads 
to the equivalent problem of studying sums of random variables modulo 1. 
This formulation is now ideally suited for Fourier analysis. The main re- 
sult is a variant of the Central Limit Theorem, which in this context states 



If the data is distributed according to Benford's Law base 2, the probability of having 
to shift the result of multiplying two numbers if the mantissas arc written as 0.x\X2X^ ■ ■ ■ 
is about .38; if they are written as x±. X2X3 ■ ■ ■ the probability is about .62. 
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that for "nice" random variables, as M — > oo the su of M independent 
random variables modulo 1 tends to the uniform distribution; by simple ex- 
ponentiation this is equivalent to Benford's Law for the product (see |Diaj ). 
To emphasize the similarity to the standard Central Limit Theorem and the 
fact that our sums are modulo 1, we refer to such results as Modulo 1 Central 
Limit Theorems. Many authors [Bhl (BoJ Eol QE1 EES O [ScH [ScH ISc3] 
have analyzed this problem in various settings and generalizations, obtaining 
sufficient conditions on the random variables (often identically distributed) 
as well as estimates on the rate of convergence. 

Our main result is a proof, using only elementary results from Fourier 
analysis, of a necessary and sufficient condition for a sum modulo 1 to 
converge to the uniform distribution in -^ 1 ([0, 1]). We also give a specific 
example to emphasize the different behavior possible when the random vari- 
ables are not identically distributed. We let g^ l {n) denote the n th Fourier 
coefficient of a probability density g m on [0, 1]: 

g7n(n) = f g m {x)e- 2mnx dx. (1.2) 
Jo 

Theorem 1.1 (The Modulo 1 Central Limit Theorem for Independent 
Continuous Random Variables). Let {Y m } be independent continuous ran- 
dom variables on [0, 1), not necessarily identically distributed, with densities 
{gm}- A necessary and sufficient condition for the sum Yi + - ■ -+Ym modulo 
1 to converge to the uniform distribution as M — > oo in ^([0,1]) is that for 
each n ^ we have Hirm^oo §i(n) ■ ■ -g^iin) = 0. 

As Benford's Law is equivalent to the associated base B logarithm being 
equidistributed modulo 1 (see |Diaj ). from Theorem 11.11 we immediately 
obtain the following result on the distribution of digits of a product. 

Theorem 1.2. Let X%, . . . ,Xm be independent continuous random vari- 
ables, and let gB,m be the density of \og B Me(|X m |). A necessary and suf- 
ficient condition for the distribution of the digits of X\ ■ ■ ■ Xm to converge 
to Benford's Law (base B) as M — > oo in L 1 ({0,1}) is for each ti ^ that 
lim Jw ^ 00 ^i(n) • ■■g^{n) = 0. 

As other authors have noticed, the importance of results such as Theo- 
rem 11.21 is that they give an explanation of why so many data sets follow 
Benford's Law (or at least a close approximation to it). Specifically, if we 
can consider the observed values of a system to be the product of many in- 
dependent processes with reasonable densities, then the distribution of the 
digits of the resulting product will be close to Benford's Law. 

2 That is, we study sums of the form Y\ + ■ ■ ■ + Y M . For the standard Central Limit 
Theorem one studies ^^DevTTT^F )" • ^ e subtract the mean and divide by the standard 
deviation to obtain a quantity which will be finite as M — > oo; however, sums modulo f 
are a priori finite, and thus their unsealed value is of interest. 
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We briefly compare our approach with other proofs of results such as 
Theorem 11.11 (where the random variables are often taken as identically dis- 
tributed). If the random variables are identically distributed with density 
g, our condition reduces to [g{n)\ < 1 for For a probability distribu- 

tion, [g{n)\ = 1 for n ^ if and only if there exists a G R such that all the 
mass is contained in the set {a, a +-,... ,a + ! ^}- (As we are assuming 
our random variables are continuous and not discrete, the corresponding 
densities are in -^ 1 ([0, 1]) and this condition is not met; in Theorem 11.31 we 
discuss generalizations to discrete random variables.) In other words, the 
sum of identically distributed random variables modulo 1 converges to the 
uniform distribution if and only if the support of the distribution is not con- 
tained in a coset of a finite subgroup of the circle group [0, 1). Interestingly, 
Levy |Levj proved this just one year after Benford's paper [Ben], though 
his paper does not study digits. Levy's result has been generalized to other 
compact groups, with estimates on the rate of convergence [Bh] . Stromberg 
[Str] proved thatf] the n-fold convolution of a regular probability measure on 
a compact Hausdorff group G converges to the normalized Haar measure in 
the weak-star topology if and only if the support of the distribution is not 
contained in a coset of a proper normal closed subgroup of G. 

Our arguments in the proof of Theorem 11.11 may be generalized to inde- 
pendent discrete random variables, at the cost of replacing /^-convergence 
with weak convergence. Below 5 a (x) denotes a unit point mass at a. 

Theorem 1.3 (Modulo 1 Central Limit Theorem for Certain Independent 
Discrete Random Variables). Let {Y m } be independent discrete random vari- 
ables on [0,1), not necessarily identically distributed, with densities 



Assume that there is a finite set A C [0, 1) such that all ak, m G A. A 
necessary and sufficient condition for the sum Y± + • ■ ■ + Ym modulo 1 to 
converge weakly to the uniform distribution as M —>■ oo is that for each 
n 7^ we have limjy/^oo gi(n) ■ ■ ■ g^iin) = 0. 

In §2] we prove Theorem 11.11 using only elementary facts from Fourier 
analysis, showing our condition is a consequence of Lebesgue's Theorem (on 
/^-convergence of the Fejer series) and a standard approximation argument. 
We give an example of distinct densities {f{\ with the following properties: 
(1) for each i, if every is independently chosen with density f\ then the 
sum converges to the uniform distribution; (2) if the AVs are independent 
but non-identical, with X^. having distribution f^, then the sum does not 
converge to the uniform distribution. This example illustrates the difference 
in behavior when the random variables are not identically distributed: to 

3 The following formulation is taken almost verbatim from the first paragraph of [Bhj . 




(1.3) 
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obtain uniform behavior for the sum it does not suffice for each random 
variable to satisfy Levy or Stromberg's condition (the distribution is not 
concentrated on a coset of a finite subgroup of [0, 1)). We conclude in §3] 
by sketching the proof of Theorem 11.31 and in Appendix |A] we comment 
on alternate techniques to prove results such as Theorem 11.21 (in particular, 
why our arguments are more general than applying the standard Central 
Limit Theorem to log B \Xi\ H — ■ + log B \Xm\ to analyze the distribution of 
digits of \X 1 ---X N \). 



2. Analysis of Sums of Continuous Random Variables 

We recall some standard facts from Fourier analysis (see for example |SSJ). 
The convolution of two functions in Z^QO, 1]) is 

(f*9)(x) = [ f{y)g{x-y)dy = [ f(x-y)g(y)dy. (2.1) 
Jo Jo 

Convolution is commutative and associative, and the n th Fourier coefficient 
of a convolution is the product of the two n th Fourier coefficients. 

Let gi and g 2 be two probability densities in Z/QO, 1]). If is a random 
variable on [0, 1) with density g^, then the density of Z\ + Z 2 mod 1 is the 
convolution of g\ with g 2 . 

Definition 2.1 (Fejer kernel, Fejer series). Let f £ Z^QO, 1]). The N th 
Fejer kernel is 

Fn{x) = (1-17 K""' ( 2 - 2 ) 



n=-N 

Tth 



and the N Fejer series of f is 

T N f(x) = (f*F N )(x) = l-|r /We 2 ™-. (2.3) 

n=-N ^ * ' 



The Fejer kernels are an approximation to the identity (they are non-negative, 
integrate to 1, and for any 5 £ (0, 1/2) we have liniAr^oo 5 F^(x)dx = 0). 

Theorem 2.2 (Lebesgue's Theorem). Let f £ Z^QO, 1]). As N -> oo ; T N f 

converges to f in L l ([0, 1]). 

Lemma 2.3. Let f, g e L 1 ([0,1]). Then T N {f * g) = (T N f) * g. 

Proof. The proof follows immediately from the commutative and associative 
properties of convolution. □ 



We can now prove Theorem 11.11 



6 STEVEN J. MILLER AND MARK J. NIGRINI 

Proof of Theorem ! We first show our condition is sufficient. The density 
of the sum modulo 1 is hu — 9\ * • • • * 9m- ft suffices to show that, for any 
e > 0, 

lim / \h M (x) - lldx < e. (2.4) 

Using Lebesgue's Theorem (Theorem 12.21) . choose A" sufficiently large so 
that 

\h\{x) — Ti<[hi{x)\dx < -. (2.5) 
o 2 

While iV was chosen so that (I2.5P holds with hi, in fact this A" works for all 
hu (with the same e). This follows by induction. The base case is immediate 
(this is just our choice of N). Assume now that (12. 5p holds with hi replaced 
by hu\ we must show it holds with hi replaced by h,M+i = hu * Qm+i- By 
Lemma [2JJI we have 



T N h M +i = T N (h M * g M+ i) = (T N h M ) * g M +x- (2.6) 

This implies 

l 

\h M+ i(x) - T N h M+1 (x)\dx 
i 

\{h M * g M+1 )(x) - (T N h M ) * g M+ i(x)\dx 



o 



o 



(h M {y) - T N h M (y)) ■ g M +i(x - y) 



dydx 



1 r l 



< / / \h M {y) -T N h M {y)\ ■ 9m+i{x - y)dxdy 

^0 

\h M {y)-T N h M {y)\dyl < ^ (2.7) 
o 1 

the interchange of integration above is justified by the absolute value being 
integrable in the product measure, and the x-integral is 1 as gn+i is a 
probability density. 

To show hM converges to the uniform distribution in -^ 1 ([0, 1]), we must 
show lim A f_ +0 o \h M (x) — l\dx = 0. Let and e be as above. By the 
triangle inequality we have 

\hM{x) — l\dx < / \huix) — T N hM(x)\dx + \ \T N hu{.x) — l\dx. 

Jo Jo 

(2.8) 
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From our choices of N and e, J* \1im(x) — T^hu{x)\dx < e/2; thus we need 
only show J 1 \T N h M (x) — l\dx < e/2 to complete the proof. As /im(0) = 1, 



Jo 



n=-N \ / 
N , I |X 

n = -AT \ / 



2winx 



dx 



(2.9) 



However, /im(^) — §i{ n ) ' • • Quin) , and by assumption tends to zero as 
M — > oo (as each g^(n) is at most 1 in absolute value, for each n the 
absolute value of the product is non-increasing in M) . For fixed N and e, 

we may choose M sufficiently large so that |/im(^)| < e/4iV whenever n^O 
and Inl < iV. Thus 



which implies 



J \T N h M {x) - l\dx < 2AT-^ 



/ \}im{x) — \\dx < e 
Jo 



e 



(2.10) 



for M sufficiently large. As e is arbitrary, this completes the proof of the 
sufficiency; we now prove this condition is necessary. 

Assume for some no ^ that liniM-^oo |^m(^o)| 7^ (where as always 
}%m — g±* • • ■ Qm)- As the g m are probability densities, |<? m (n)| < 1; thus the 
sequence {|^m(^)|}m=i i s non-increasing for each n, and hence by assump- 
tion converges to some number c n G (0, 1]. 

Let Em{x) = 1im(x) — 1; note Em{u) = huin) for n ^ 0. To show /im 
does not converge to the uniform distribution on [0,1], it suffices to show 
that Em does not converge almost everywhere to the zero function on [0, 1]. 
Let n be as above. We have 



h M (n ) = E M {no) 



E M (x)e 



dx 



> c no > 0. 



(2.12) 



Therefore at least one of the following integrals is at least c„ /2: 



see [o,i] 

Re(s M (aO)>0 



/ 



ze[o,i] 

m(%(i))>0 



Re (E M {x)) dx, 



Im (Em(x)) dx, 



see [o,i] 

Re(B M (a:))<0 



ice [0,1] 
m(%(i))<0 



Re (—E M (x)) dx 



lm(-E M (x))dx, (2.13) 
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Figure 1. Distribution of digits (base 10) of 1000 products 
Xi--- Xiooo, where g> 10 , m = 0n™. 

and Hm cannot converge to the zero function in ^ 1 ([0, 1]); further, we obtain 
an estimate on the L 1 -distance between the uniform distribution and Hm- 

□ 

The behavior is non-Benford if the conditions of Theorem [L2] are violated. 
It is enough to show that we can find a sequence of densities gB,m such that 

fimjVjf-,.00 rim=i 9Bsn{l) 7^ 0- We are reduced to searching for an infinite 
product that is non-zero; we also need each term to be at most 1, as the 
Fourier coefficients of a probability density are dominated by 1. A standard 
example is Y[ m c m, where c m = T^ipfpr ; the limit of this product is 1/2. Thus 
as long as <7b^(1) > ^jfw ; the conclusion of Theorem II .21 will not hold for 
the products of the associated random variables; analogous reasoning yields 
a sum of independent random variables modulo 1 which does not converge 
to the uniform distribution. 

Example 2.4 (Non-Benford Behavior of Products). Consider 



(fim is non-negative and integrates to 1. As m — > oo we have |0 m (l)| — > 
1 because the density becomes concentrated at 1/8 (direct calculation gives 
4> m (l) = e 2m / 8 + 0(m~ 2 ) ). Let Xi, . . . , Xm be independent random variables 
where the associated densities gB, m of \og B M(\X m \) The behavior 

is non-Benford (see Figure U\). Note, however, that if each X m had the 
common distribution (pi for any fixed i, then in the limit the product will 
satisfy Benford's law. 

Remark 2.5. Generalizations of Theorem M . 1\ hold for more general sums of 
random variables. Instead ofYx + ■ ■ - + Y M we may study rjiYi + ■ ■ ■ + rj M Y M , 
where each r] m is a random variable taking values in { — 1,1}; the proof 
follows from the observation that if Y m has density g m {y) then —Y m has 
density g m (l-y). 




(2.14) 
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3. Analysis of Sums of Discrete Random Variables 

Many results from Fourier analysis do not apply if the random variables 
are discrete; Lebesgue's Theorem cannot be correct for a point mass as the 
density is concentrated on a set of measure zero. Let S a (x) be a unit point 
mass^ at a. Its Fourier coefficients are S a (n) = e - 27rma ; anc l simple algebra 
shows that its Fejer series is 

-2m(N -l)(x-a) / 2mN{x-a) _ i\2 

F N S a {x) = ——, — \ — — '-. (3.1) 

For i/a, limN^oo Fn 5 a {x) = S a (x) = 0; moreover, for x near a we have 
\FnSoi(x)\ ~ N. Instead of convergence in -^ 1 ([0, 1]) we have weak conver- 
gence: for any Schwartz function 0, 

1 r i 



lim / FN5 a (x)(J)(x)dx = / 5 a (x)(J)(x)dx = <p{a). (3.2) 
Jo Jo 

Sketch of the proof of Theorem 1 1 . 51 We argue as in Theorem ll.il Note Lemma 
2.31 holds if / and g are sums of point masses. Instead of using Lebesgue's 
Theorem, we use weak convergence: given an e > and a Schwartz function 
4>(x), by weak convergence there is an N such that 



i 

(hi(x) — T N hi(x)) <p(x)dx 



o 



< \ (3.3) 



This is the generalization of (12.51) . Further, we may assume (13.31) holds 
with <f)(x) replaced with <fi akm (x) = <f>(x + atk,m) f° r an y a k,m G -A. This is 
only true because A is finite; while iV = N(<f)) depends on (ft, as there are 
only finitely many test functions (f) ak m we may take = max N(<p ak m ). A 
similar analysis as before shows (13.31) also holds with hi replaced by h^. 
The key step in the induction is 

i 

(h M (y) - T N h M (y))g M+1 (x - y)(j)(x)dxdy 

rjn+i 







(h M (y) - T N h M {y)) w kt M +1 4>(y + a ktM +i)dy 
k=i 

= w k>M +i / (h M (y) -T N h M (y))(j) akM+1 (y)dy, (3.4) 
fc=i Jo 

which, as the w k ,M+i sum to 1, is less than e/2 in absolute value. Arguing 
as in Theorem 11.11 completes the proof. □ 



4 Thus S a (x) is a Dirac delta functional; if (f>(x) is a Schwartz function then 
f 8 a {x)(j){x)dx is defined to be 4>{a). 
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Appendix A. Comparison with Alternate Techniques 

We discuss an alternate proof of Theorem 11.2} applying the standard 
Central Limit Theorem to the sum log B \X\\ + ■ ■ ■ + log B \X M \ and noting 
that as the variance of a Gaussian increases to infinity, the Gaussian becomes 
uniformly distributed modulo 1. A significant drawback of a proof by the 
Central Limit Theorem is the requirement (at a minimum) that the variance 
of each log B \X m \ be finite. This is a very weak condition, and in fact 
many random variables X with infinite variance (such as Pareto or modified 
Cauchy distributions) do have log B \X\ having finite variance; however, there 
are distributions where log B \X\ has infinite variance. 

To a density / on [0, oo) we associate the density of the mantissa, 
Explicitly, the probability that X has first digit (base B) in [1, s) is just 

/ f B (t)dt = T / f(x)dx. (A.l) 

J l m= _ 00 Jl.B'-<x<s-Bm 

Let X be the random variable with density 

/.<*) = H*^*^ «f* e (A.2) 
I (J otherwise. 

This is a probability distribution for a > 0, and is a modification of a 
Pareto distribution; see |Mij for some applications and properties of this 
distribution. We study the distribution of the digits base e; analogous results 
hold for other bases. The density of Y = logX is g(y) = ay~^ a+lS} for y > 1 
and otherwise. For a G (0, 2] the random variable Y has infinite variance, 
and thus we cannot prove the Benford behavior of products through the 
Central Limit Theorem; however, we can show the random variable X does 
satisfy the conditions of Theorem 11.21 

Let F e Q , be the cumulative distribution function of the digits (base e) 
associated to the density f a of (1A.2D . and let / e>a be the corresponding 
density of F e cr . We assume a > 1 below to ensure convergence. By (1A. 1[) 
we have 



F K Js) 



/ fe,a{t) dt = Yl / fa(x)dx, (A.3) 

Jl m= QJle m <x<s-e m 

with s G [1, e). A simple integration gives 

x 1 °° 1 



log Q (s ■ e m ) ^— ' m c 

m=0 & v > m=0 



note the second sum converges if a > 1. The derivative of the first infinite 
sum in the expansion of F e a (s) is the sum of the derivatives of the individual 
summands, which follows from the rapid decay of the summands (see, for 
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example, Corollary 7.3 of [LaJ). Differentiating the cumulative distribution 
function F e Q , gives the density 

feA s ) = a Yl i <*+U ^> se[l,e). (A.5) 

^— 1 s log ( s ■ e m 

m=0 o \ / 

As a > 1, for m ^ the m th summand is bounded by m~( a+1 \ Thus 
the series for / e>a (s) converges and is uniformly bounded for all s. A simple 
analysis shows that the conditions of Theorem II .21 are satisfied for a G (1, 2]. 

The reason the Central Limit Theorem fails for densities such as that in 
flA.2j) is that it tries to provide too much information. The Central Limit 



Theorem tries to give us the limiting distribution of log B \Xi ■ ■ ■ Xm\ = 
log B \Xi \ + ■ — h log B \Xm\', however, as we are only interested in the distri- 
bution of the digits of Xi ■ ■ ■ Xm, this is more information than we need. 
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